K2/Kleisli and GUS: Experiments in integrated access to genomic data sources

نویسندگان

  • Susan B. Davidson
  • Jonathan Crabtree
  • Brian P. Brunk
  • Jonathan Schug
  • Val Tannen
  • G. Christian Overton
  • Christian J. Stoeckert
چکیده

The integration of heterogeneous data sources and software systems is a major issue in the biomed ical community and several approaches have been explored: linking databases, "on-thefly" integration through views, and integration through warehousing. In this paper we report on our experiences with two systems that were developed at the University of Pennsylvania: an integration system called K2, which has primarily been used to provide views over multiple external data sources and software systems; and a data warehouse called GUS which downloads, cleans, integrates and annotates data from multiple external data sources. Although the view and warehouse approaches each have their advantages, there is no clear "winner". Therefore, users must consider how the data is to be used, what the performance guarantees must be, and how much programmer time and expertise is available to choose the best strategy for a particular application. Comments Postprint version. Published in IBM Systems Journal, Volume 40, Issue 2, March 2001, pages 512-531. Publisher URL: http://search.ebscohost.com/login.aspx?direct=true&db=aph&AN=4628447&site=ehostlive Author(s) Susan B. Davidson, Jonathan Crabtree, Brian P. Brunk, Jonathan Schug, Val Tannen, Chris Overton, and Christian J. Stoeckert This journal article is available at ScholarlyCommons: http://repository.upenn.edu/db_research/23 K2/Kleisli and GUS: Experiments in Integrated Access to Genomic Data Sources Susan B. Davidson, Jonathan Crabtree, Brian Brunk, Jonathan Schug, Val Tannen, Chris Overton and Chris Stoeckert Center for Bioinformatics Dept. of Computer and Information Science University of Pennsylvania Philadelphia, PA 19104 fsusan,crabtree,brunkb,jschug,[email protected], [email protected] To appear in the IBM Systems Journal.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

برآورد صحت انتخاب ژنومی در جوامع کوچک ژنتیکی- مطالعه‌ شبیه‌سازی

In the present study two genetically connected small and large populations were simulated and the effect of different sources of information from foreign populations on the accuracy of predicted genomic breeding values of young animals of the small population was investigated. A large population consist of 200000 animals over 15 generations and a small population consist of 5000 animals over 3 ...

متن کامل

The Kleisli Approach to Data Transformation and Integration

Kleisli is a data transformation and integration system that can be used for any application where the data is typed, but has proven especially useful for bioinformatics applications. It extends the conventional at relational data model supported by the query language SQL to a complex object data model supported by the collection programming language CPL. It also opens up the closed nature of c...

متن کامل

Ontology Based Integration of Distributed and Heterogeneous Data Sources in ACGT

In this work, we describe the set of tools comprising the Data Access Infrastructure within Advancing Clinico-genomic Trials on Cancer (ACGT), a R&D Project funded in part by the European. This infrastructure aims at improving Post-genomic clinical trials by providing seamless access to integrated clinical, genetic, and image databases. A data access layer, based on OGSA-DAI, has been developed...

متن کامل

Construction of Biological Databases: a Case Study on the Protein Phosphatase Database (ppdb)

Biological data is being created at ever-increasing rates as different highthroughput technologies are implemented for a wide variety of discovery platforms. It is crucial for researchers to be able to not only access this information but also to integrate it well and synthesize new holistic ideas about various topics. A key ingredient in this process of data-driven knowledge-based discovery is...

متن کامل

افزایش سرعت نگهداری افزایشی دید با استفاده از الگوریتم فاخته

Data warehouse is a repository of integrated data that is collected from various sources. Data warehouse has a capability of maintaining data from various sources in its view form. So, the view should be maintained and updated during changes of sources. Since the increase in updates may cause costly overhead, it is necessary to update views with high accuracy. Optimal Delta Evaluation method is...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • IBM Systems Journal

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2001